Parallel Knowledge Embedding with MapReduce on a Multi-core Processor

نویسندگان

  • Miao Fan
  • Qiang Zhou
  • Thomas Fang Zheng
  • Ralph Grishman
چکیده

This article firstly attempts to explore parallel algorithms of learning distributed representations for both entities and relations in large-scale knowledge repositories with MapReduce programming model on a multi-core processor. We accelerate the training progress of a canonical knowledge embedding method, i.e. translating embedding (TransE) model, by dividing a whole knowledge repository into several balanced subsets, and feeding each subset into an individual core where local embeddings can concurrently run updating during the Map phase. However, it usually suffers from inconsistent low-dimensional vector representations of the same key, which are collected from different Map workers, and further leads to conflicts when conducting Reduce to merge the various vectors associated with the same key. Therefore, we try several strategies to acquire the merged embeddings which may not only retain the performance of entity inference, relation prediction, and even triplet classification evaluated by the single-thread TransE on several well-known knowledge bases such as Freebase and NELL, but also scale up the learning speed along with the number of cores within a processor. So far, the empirical studies show that we could achieve comparable results as the single-thread TransE performs by the stochastic gradient descend (SGD) algorithm, as well as increase the training speed multiple times via adapting the batch gradient descend (BGD) algorithm for MapReduce paradigm.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Evaluating MapReduce for Multi-core and Multiprocessor Systems

As multi-core chips become ubiquitous, it is critical to develop parallel programming models and runtime systems that can harness their computational capabilities. In this paper, we evaluate the suitability of the MapReduce model for multi-core and multi-processor systems. MapReduce was developed by Google to program and manage data-centers with thousands of servers. It allows programmers to wr...

متن کامل

Bitonic-MapReduce: Optimization of MapReduce on the Cell B.E. Architecture with a Bitonic Sort Senior Honors Thesis

The Cell B.E. Architecture is a novel, heterogeneous, multi-core architecture that offers opportunities for significant performance. However, a lack of programmer familiarity with explicitly parallelizing code and difficulty using its unique software-managed memory model make writing programs for the Cell difficult, even for experienced programmers. However, if tools can be made to abstract awa...

متن کامل

A Study of the Optimistic Mapreduce Techniques for Energy Minimization and Performance Enhancement for Multicore Cloud Computing Applications

Multi-core architecture is established on a number of processors and has local caches (memories). When all the mandatory actions of a computer are executed on a processor which has more than one core to execute, its processor is known as multi core architecture. Multi-core processing is used to make tasks energy efficient, augment their performance and to make multiple tasks run concurrently in...

متن کامل

Optimizing the use of the Hard Disk in MapReduce Frameworks for Multi-core Architectures*

MapReduce simplifies parallel programming, abstracting the responsibility of the programmer, such asing the responsibility of the programmer, such as synchronization and task management. The paradigm allows the programmer to write sequential code that is automatically parallelized. The MapReduce Frameworks developed for multi-core architectures provide large processing keys which consequently g...

متن کامل

Bind: a Partitioned Global Workflow Parallel Programming Model

High Performance Computing is notorious for its long and expensive software development cycle. To address this challenge, we present Bind: a ”partitioned global workflow” parallel programming model for C++ applications that enables quick prototyping and agile development cycles for high performance computing software targeting heterogeneous distributed manycore architectures. We present applica...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:
  • CoRR

دوره abs/1509.01183  شماره 

صفحات  -

تاریخ انتشار 2015